Validity and reliability of the Professionalism Assessment Scale in Turkish medical students

Medical professionalism is a basic competency in medical education. This study aimed to adapt the Professionalism Assessment Scale, which is used to evaluate the professionalism attitudes of medical students, into Turkish and to assess its validity and reliability. First, the scale’s translation-back-translation was performed and piloted on 30 students. Then, the final scale was applied to medical students to ensure the scale’s validity. The Penn State University College of Medicine Professionalism Questionnaire was used for external validation to assess criterion validity. Confirmatory factor analysis was performed for structure validity. Test-retest, item correlations, split-half analysis, and Cronbach’s alpha coefficient were evaluated to determine the scale’s reliability. SPSS 25.0 and AMOS 24.0 package programs were used for statistical analysis. The statistical significance level was accepted as P<0.05. The mean age of the participants was 21±2 years, and 50.5% (n = 166) were female. Three hundred thirty-five students were invited, and 329 participated in the study. The response rate was 98%. The mean total Professionalism Assessment Scale score was 96.36±12.04. The three-factor structure of the scale, “empathy and humanism,” “professional relationship and development,” and “responsibility,” was confirmed. The Cronbach’s alpha coefficient of the scale was 0.94, and both the Spearman-Brown and Guttman split-half coefficients were 0.89. The three-factor structure of the scale, consisting of 22 items, explained 59.1% of the total variance. The intraclass correlation coefficient between test-retest measurements was 0.81. Confirmatory factor analysis showed a model suitable for the original version of the scale (χ2/sd = 2.814, RMSEA = 0.074). The Turkish version of the Professionalism Assessment Scale is a valid and reliable tool to determine the professionalism attitudes of medical students in Turkey.


Introduction
Medical professionalism is the entirety of doctors' behaviors needed to earn the patients' and the public's trust while working for their benefit [1,2]. With the evolution of the physician's role from healer to professional in the last 20 years, medical professionalism has become one such as empathy, humanism, professional relationships and responsibility. It has been reported that the scale is a valid and reliable tool that can be used both for the summative and formative evaluation of medical students' professional attitudes and for the self-evaluation of students [28]. The PAS is an easy-to-use tool that can be answered quickly, with fewer items than PSCOM. To the best of our knowledge, there is no Turkish or other version of the scale. This study aimed to adapt the PAS [28] tool to Turkish and conduct a validity and reliability study.

Study design
The study is of methodological type and is a two-stage observational validation study. In the first stage, Z. Klemenc-Ketis was contacted via e-mail, and the necessary permission was obtained for the scale to be adapted into Turkish [28]. Then, the necessary permission for the study was obtained from the Ataturk University Faculty of Medicine Clinical Research Ethics Committee (No: 04/72. Date: 04.11.2021). The study was carried out under the rules of the Helsinki Declaration.

Setting and participants
The study population consisted of Atatürk University Faculty of Medicine third-year students (n = 365). Since the theoretical courses on professionalism started in the third year of our education program, third-year students were selected as the study group. Thirty students who participated in the pilot study were excluded from the study. Thus, the study was carried out with 335 students. All students were invited to participate in the survey. Data from six students who did not complete the questionnaires were excluded from the study, and the complete data of 329 students were evaluated. Fifteen days after the first test, the same scale was sent to the students for a retest to determine the scale's reliability, and the correlation of the responses was evaluated. Seventy-five students participated in the retest. The participation rate was 98% for the first test and 22% for the retest. Students who volunteered and gave consent were included. Data were collected through an online questionnaire prepared via Google forms. The students were informed about the purpose and scope of the study via e-mail, and the online survey link was shared in the WhatsApp class group, which included all the students. Information about the aim of the study was also included at the beginning of the questionnaire. The participants were asked to give their consent to the statement, "I voluntarily accept participation in the study". Those who did not give consent could not answer the questions. Thus, online consent from the participants was obtained.
The study was carried out between 01.12.2021 and 15.12.2021. During this period, three reminder messages were sent to the students. Identity information was not requested from the students, and data were collected anonymously. However, to match the questionnaires, the students were asked to write the last four digits of their phone numbers. The inclusion criteria for the study were determined to be being a third-year student, volunteering, and giving consent. The questionnaire took approximately 15 minutes.

Sample size
The minimum sample size required for our study was 180 when Cronbach's alpha was 80% at the 95% confidence interval at 80% power. Considering a 10% loss, this number has been set at 200.

Data collection tools
The data collection tool included the PAS, PSCOM Professionalism Attitude Scale Student Form (PSCOM-SF), and sociodemographic questions such as age, sex, reasons for choosing a medical school, and postgraduate plan.
Professionalism Assessment Scale. The PAS is a 22-item self-assessment tool that evaluates the professional attitudes of medical students [28]. The scale has a 5-point Likert scale: strongly disagree (1), disagree (2), undecided (3), agree (4), and strongly agree (5). Scale items are scored between 1-5 points. None of the items are reverse scored. The scale has three dimensions: 1) empathy and humanism (EH), 2) professional relationship and development (PR-D), and 3) responsibility (R). The total score obtained from the scale ranges between 22 and 150. Dimension scores are 10-50 points for EH, 8-80 points for PR-D, and 4-20 points for R. Higher scores indicate more positive attitudes toward professionalism. Cronbach's alpha value was determined to be 0.88.
PSCOM Professionalism Attitude Scale-student form. The PSCOM-SF is a scale developed by Penn State University College of Medicine (2007) to evaluate medical students' professionalism attitudes. Its Cronbach's alpha was determined to be 0.51-0.78 [23]. It was adapted into Turkish by Demirören and Ö ztuna (2015), and Cronbach's alpha levels of the subscales ranged from 0.46 to 0.76 [24]. There are 7 dimensions and 36 items on the scale: accountability, enrichment, equity, honor and integrity, altruism, duty, and respect. Scale items are evaluated according to a 5-point Likert system ((never (1 point). . . always (5 points)). None of the items in the scale are reverse scored. The total score obtained from the scale is between 36 and 180. Dimension scores can be calculated separately. A higher score on the scale points to a more positive attitude toward professionalism [24].

Procedures performed within the scope of the scale's Turkish adaptation
Language adaptation. The translation of the scale into Turkish was made in line with the recommendations of Hilton and International Test Commission guidelines [29,30]. First, two independent experts fluent in English and Turkish were informed of Turkey's culture and were close to medical terminology due to teaching foreign languages in different medical faculties; they were asked to translate the original scale into Turkish. Before the translation, one of the translators was informed about the subject, the study's purpose, how the scale was used, and how articles about the scale were conveyed. The other translator was only asked to translate the scale. The researchers compared Turkish translations in terms of meaning and grammar, and it was determined that the translated form was not different from the original. A Turkish language expert was also consulted on the Turkish form created. Thus, the first Turkish version of the scale was obtained.
After this stage, the Turkish scale was translated back into English by an expert whose mother tongue was English and who could speak fluent Turkish, who had no knowledge of the scale and was not involved in the first translation. Researchers compared the two English versions to determine the differences between the back-translation and the original scale and found no semantic differences other than minor grammatical differences. In the translation phase, consistency in meaning was regarded instead of translating the scale items word for word. The Turkish version of the scale is presented in Table 1.
Before starting data collection, a pilot study was conducted with 30 students. The sample of the pilot application was created to be similar to the target group in terms of characteristics such as age range and sex. Participants in the pilot study were asked to read the scale items aloud and briefly explain the meaning of each. Thus, whether the students had difficulty understanding and whether there was a difference in meaning was determined. Then, the scale was given its final shape and applied to the students. Participants in the pilot study were excluded.

Statistical analyses
SPSS v25.0 (Statistical Package for Social Sciences) and AMOS v24.0 (Analysis of a moment structure) package programs were used for the scale's validity and reliability analysis. Demographic data are given as descriptive statistics. Sociodemographic characteristics are presented as the mean ± standard deviation (SD) or as numbers and percentages. Scale scores are given as the mean±SD. Initially, in the validity analysis, Kaiser-Meyer-Olkin (KMO) and Bartlett sphericity tests were performed to evaluate whether the data were suitable for factor analysis. Then, Hotelling's T 2 test was used to test the differences in the mean item scores, and confirmatory factor analysis (CFA) was used to test the construct validity. Cronbach's alpha coefficient, split-half analysis, and Guttman split-half and Spearman-Brown coefficients were
https://doi.org/10.1371/journal.pone.0281000.t001 analyzed for reliability. Intraclass correlation was checked with test-retest. Correlation analyses and Cronbach's alpha were used for the scale's internal consistency. Factor counts were determined by eigenvalues >1 and scree plots. The fit of the first-level CFA model results was evaluated as follows: Chi-square statistics (χ 2 ), Chi-square degrees of freedom ratio (CMIN/DF), the goodness-of-fit index (GFI), incremental fit index (IFI), comparative index of fit (CFI), root mean square error of approximation (RMSEA), standardized root mean square (SRMR), and Tucker-Lewis index (TLI) were used. A P level of <0.05 was considered significant.

Results
Of the 335 invited students, 329 participated in the study. The response rate was 98%.

Characteristics of participants
The participants' mean age (±SD) was 21±2 years , and 50.5% were female. The sociodemographic characteristics of the students are shown in Table 2.

Findings regarding the validity of the scale
Bartlett's test of sphericity and Kaiser-Meyer-Olkin measure. The KMO value was 0.956, and Bartlett's test of sphericity was statistically significant (Approx. = 4297.828, degrees of freedom (DF) = 231, P <0.001). Thus, it was determined that the PAS scale was suitable for factor analysis. As a result of the total amount of variance explained and factor analysis of the PAS scale, it was determined that the eigenvalues of the items were grouped into three subfactors above 1.00. Of these, Factor 1 explained 48.3% of the total variance, Factor 2 explained 6%, and Factor 3 explained 4.8%. The 22-item PAS-TR explained 59.16% of the total variance. The dispersion point test determined that the scale had 3 factors, and factors after the third were not explanatory (Fig 1).
Each item constituting the factors makes a statistically significant contribution to the model (P<0.05).
Model fit of the scale. According to the goodness-of-fit analysis of the first-level CFA model, the model was compatible with the study's original structure (χ 2 /df = 2.81, CFI = 0.91, TLI = 0.90, RMSEA = 0.07). In our study, RMSEA, TLI, and CFI showed an acceptable fit, and SRMR showed an adequate fit. The reference values of the frequently used goodness-of-fit indexes in the literature and the goodness-of-fit analysis results are shown in Table 3.  (Table 4). Cronbach's alpha values were 0.88 and 0.91 in the split-half analysis performed to measure the scale's reliability. The split-half analysis findings are given in Table 5.
The analysis of variance result, performed to determine whether the scale items are additive or not, showed the scale to be additive (Nonadditivity: F = 1.193 P = 0.275>0.05). A significant difference was observed in the measurement variation (between measures, F = 32.21, P <0.001). The equality of the question means was tested with the Hotelling T 2 test, and a significant difference was found between the averages (Hotelling's T-Squared = 450.201, F = 20.131, P <0.001).

Students' PAS-TR and PSCOM-SF scores
Students' PAS-TR mean scores were above four for all items on the scale. Scale items, dimensions, and mean scores are shown in Table 6.
The PAS-TR total score was 96.36±12.04 , and the dimension scores were 44.95 ±5.80 for EH, 33.94±4.60 for PR-D, and 17.47±2.43 for R.
The PSCOM-SF total score was 155.27±16.75. There was a significant correlation between scale scores for all dimensions ( Table 7).
The PAS-TR (97.59±12.12 vs. 94.40±11.69, P = 0.001) and PSCOM-SF (158.94±14.40 vs. 149.32 ±18.56, P <0.001) scores of students who chose medical school because it was ideal and to help others were significantly higher than those of students who chose it for other reasons. There was no significant difference between the sexes.

Discussion
Our results showed that the PAS-TR is a valid and reliable scale that can determine medical students' attitudes toward professionalism in Turkish society. In scale adaptation studies, Bartlett's test of sphericity and Kaiser-Meyer-Olkin (KMO) measurements are used to

PLOS ONE
demonstrate sample size adequacy and evaluate the scale's fit for factor analysis. If the KMO value is higher than the threshold value of 0.6, Bartlett's test of sphericity should be significant [31,32]. In our study, the KMO value was 0.95, and Bartlett's test of sphericity was significant. Thus, it can be said that the scale can effectively measure the phenomenon, there is a correlation between the variables, and the study is suitable for factor analysis. Factor loads should be at least 0.30 to discuss construct validity according to sample size [33,34]. In our study, factor loading values indicate construct validity. CFA determines the validity of measurement tools developed in other samples and cultures [35]. CFA was performed to determine whether the Turkish sample could confirm the scale's factor structure and construct validity. Various model fit indexes were used to investigate the fit of PAS-TR to the data. According to the goodness-of-fit analysis results of the first-level CFA model, the model was compatible with the original structure of the study. The DFA was first calculated by dividing the chi-square value by the degrees of freedom. A χ 2 /df value below 5 is considered an adequate model fit [36]. In our study, this proportion was 2.81. According to the goodness-of-fit analysis results of the first-level CFA model, the sample model was  consistent with the original structure of the study and was significant (χ 2 /sd = 2.814, RMSEA = 0.07). A CFI of 0.90 and above indicates an acceptable fit, while a CFI greater than 0.95 is considered a perfect fit [37,38]. Similarly, a TLI of 0.90 or greater is an acceptable fit, while a TLI greater than 0.95 indicates a perfect fit [39]. An RMSEA index of less than 0.08 is acceptable, and an index of less than 0.05 is considered excellent [40,41]. In our study, RMSEA, TLI, and CFI showed an acceptable fit, while CMIN/DF and SRMR showed an adequate fit. Through factor analysis, the three-factor structure of the original scale was confirmed. While the items collected in Factor 1 were associated with the "empathy and humanism" subdomain of the original scale, the items under Factor 2 were associated with the "professional relationship and development" dimension, and the items under Factor 3 were associated with the "responsibility" subdomain. Some items in the original scale were included in at least two factors and were placed in the most appropriate subdomain. This situation is associated with professionalism's interrelatedness and often overlapping characteristics [28]. In the current study, since all items were under the same factors as the original structure, the names of the dimensions were kept and formatted in parallel with the original scale [28].
The variance explained in the scales should be at least 50% of the total variance; representativeness cannot be asserted if it explains any less [35]. In our study, the PAS-TR explained 59.1% of the total variance. The higher the explained variance is, the better a concept or construct is measured [35]. In the original scale, the full scale explained 46.8% of the variance [28].
Empathy and humanism are core values of medical professionalism [28,42]. As in the original scale, empathy and humanism made up the majority of the variance in our study and were revealed as the main factors. The "professional relations and development" dimension is the second most crucial component of the variance. Since this subdomain covers continuing professional development, lower grades from undergraduate students could be expected, but our study did not confirm this. Our results are consistent with the literature [43,44]. Responsibility, the third dimension of the scale, is a vital component of professionalism. While providing health services, a physician is responsible for the patient, society, and profession.
Intraclass correlation showing temporal consistency and reliability was significant in testretest measurements. This result shows that the scale measurements are consistent in a certain period. Furthermore, in the test and retest, the internal consistency of the scale items and dimensions was appropriate.
According to the Cronbach's alpha values obtained, no item was required to be removed if an item was deleted from the scale. When assessing the scale's internal consistency, it is recommended to calculate Cronbach's alpha values for each dimension and the overall scale [45]. A Cronbach's alpha value of at least 0.70 is recommended for acceptable internal consistency [35]. In the current study, the scale's Cronbach's alpha values were 0.94 for the whole scale and between 0.75-0.80 for the subdomains. The internal consistency of the original scale dimensions was between 0.60-0.84 [28]. According to all these findings, we can state that the whole scale and its domains are reliable. In various studies using different professionalism scales, internal consistencies ranged between 0.71 and 0.86 [43,46,47].
Examining the studies using the PAS scale in the literature, the total score was 90.9±8.9 in the study of Klemenc-Ketis & Vrecko and 92.6±6.1 in the study of Selic et al. [28,46]. In our study, attitude scores were higher than those in these studies. This suggests that our students are aware of medical professionalism and have a positive attitude toward professionalism.
Various studies have reported that women's professionalism attitude scores are higher than men's [24,26,27,47]. However, in the current study, no significant difference was found between the sexes.
Empathy was reported as the main factor in the original scale, as in our study, and this result was associated with the high female population in the study sample [28]. However, since our study's male/female ratio was similar, we cannot discuss such a relationship. In addition to the differences in medical school curricula in countries where the studies were conducted, the culture and environment may also have influenced these differences.
Our study used the PSCOM-SF as an external scale for measuring professionalism attitudes to evaluate the scale's convergent validity. A positive and significant correlation was found between the scores of both scales. This result shows that both scales measure the same characteristics. A previous study reported that the attitude scores of students who prefer medical school because it is ideal and to help people are higher than those of students who choose it for other reasons [26]. Our study confirmed this result with both the PAS-TR and PSCOM-SF scales. This finding suggests that making conscious choices significantly impacts students' professional attitudes.
The results showed that the PAS-TR is a valid and reliable scale for evaluating the professionalism attitudes of medical students. It was observed that the internal consistency of the PAS-TR was high, and it provided criterion validity. The scale covers the main factors related to medical professionalism. Psychometrically, the three-dimensional structure of the scale was confirmed with adequate fit values. This shows that the scale is a good measurement tool for determining the professionalism attitudes of medical students. The scale can be used to evaluate medical students' professional attitudes, in formative assessments, in determining the effects of time and education on professional attitudes, and in students' self-evaluation.

Limitations and challenges in using the Professionalism Assessment Scale in medical students
There are several potential limitations and challenges in using the Professionalism Assessment Scale (PAS) in medical students. Some of these may include the following: Limited research: PAS has not been extensively studied, particularly in the context of medical education. More research is needed to fully understand the scale's reliability and validity in this population.
Subjectivity: The PAS relies on ratings of professionalism by individuals, which can be subjective. Different raters may have different perceptions of what constitutes professionalism, which could affect the scale results.
Cultural differences: The PAS may not be fully applicable to medical students from other cultural backgrounds. It may be necessary to adapt the scale or develop a new measure to assess professionalism in these students.
Respondent burden: The PAS is a long scale with many items, which may be burdensome for respondents to complete. This could affect the reliability and validity of the results.
Limited focus on specific domains of professionalism: The PAS assesses several domains of professionalism, but it may not capture all aspects of this complex construct.

Study strengths and limitations
The study's strength is that it provides researchers with an instrument with proven validity and reliability, as well as international comparability of results. And the PAS is a short scale compared to other scales that can be used to measure professionalism in Turkish. However, the following limitations should be acknowledged as well. It is a cross-sectional study that was conducted on third-year students of a single medical school. The results may not represent all medical students, and their generalizability is limited. Another limitation is the low number of students who participated in the retest.

Conclusions
It has been proven that the PAS-TR scale is valid and reliable in measuring the professionalism attitude of medical students. During the validation process, the 22-item, three-factor structure of the original version of the scale was preserved. Therefore, the scale can be used as a practical scale that can be answered in a short time, with a small number of items, in evaluating the professionalism attitudes of medical students. Further studies are needed to determine the use of the scale in medical residents and physicians. It would be beneficial to test the scale in various health professional groups and larger samples. It must be noted that the validity and reliability of the PAS may vary depending on the specific population being studied and the context in which the scale is being used.